In this section, I perform non-thresholded Gene Set Enrichment Analysis (GSEA) on the ranked list of genes obtained from Assignment 2, which analyzed the GSE173955 dataset comparing Alzheimer’s Disease (AD) samples against controls.
Compared to Over-Representation Analysis (ORA), which was used in Assignment 2 and needs to set a cutoff for significant genes, GSEA uses the whole ranked list of genes. It does not depend on setting a threshold, and can find weaker but meaningful expression changes across related gene groups. This method was also recommended in the BCB420 lectures, because it helps us find more useful patterns from the ranked data instead of just a small set of significant genes.
I used the GSEA software (version 4.3.2) from the Broad Institute to run the analysis. The method selected was Preranked GSEA, which allows the user to input a custom-ranked list of genes. I ranked genes by their F-statistic, calculated from the differential expression results using edgeR in Assignment 2.
# convert ranked_list.csv to ranked_list.rnk
ranked_df <- read.csv("~/Yue_Chen/ranked_list.csv")
rnk <- ranked_df[, c("X", "F")]
rnk <- na.omit(rnk)
write.csv(rnk, "~/Yue_Chen/ranked_list.rnk")
The gene set used was: - “Human_GOBP_AllPathways_noPFOCR_no_GO_iea_March_01_2025_symbol.gmt” - Downloaded from the Bader Lab GeneSets repository
GSEA Parameters were set as: - Permutations: 1000 — for stable NES and FDR scores - Collapse: No_Collapse — gene symbols were used directly - Enrichment statistic: Weighted — emphasizes strongly ranked genes - Max/Min size: 200 / 15 — to filter overly large or tiny pathways
After I ran the GSEA, I found that 6308 out of 6309 gene sets were enriched in the positive direction (na_pos), and only one gene set was enriched in the negative direction (na_neg). Among the positive ones, 3138 gene sets had FDR < 25%, which means they are likely statistically meaningful.
The top 3 scoring gene set in the positive direction (na_pos) were: - ENDOCARDIAL CUSHION FORMATION (NES = 1.72, FDR q-val = 0.229) - ACTIVATED PKN1 STIMULATES TRANSCRIPTION OF AR (ANDROGEN RECEPTOR) REGULATED GENES KLK2 AND KLK3 (NES = 1.69, FDR q-val = 0.250) - NEGATIVE REGULATION OF INTRACELLULAR PROTEIN TRANSPORT (NES = 1.68, FDR q-val = 0.173)
The top scoring gene set in the negative direction (na_neg) was: - Negative regulation of Toll-like receptor signaling pathway (NES = -0.98, FDR q-val = 0.542)
Most of the enriched gene sets are in the na_pos group, showing that many biological processes were upregulated in the AD samples. Even though the top results are not typical brain-related pathways, some of them might still be linked to signal transduction or shared cell mechanisms.
In A2, I set the threshold score of 0.05 in ORA analysis, which gave me only 36 top results. Also, there isn’t any strong evidence proving that I should use 0.05 as threshold. While by using GSEA, it seems more global that it can return the whole list with decrease of probability. This make the result more reliable because even if some genes were not very significant on their own, they could still be part of a bigger trend in a pathway. Qualitatively, the GSEA results included many more pathways than the ORA in Assignment 2, and they covered broader biological processes.
Honestly, it’s not easy to directly compare the two methods. Since I mentioned above that GSEA uses the whole ranked list and ORA only uses a small cutoff-based group, the logic behind them is already quite different. ORA gives fewer results that are often stronger but limited, while GSEA gives broader trends even if individual genes are not significant. Because of that, they serve different purposes and answer slightly different questions, so their outputs should not be directly compared. Instead, they can complement each other.
The parameters I used were: - FDR q-value cutoff: 0.1
- Nominal p-value cutoff: 0.01
- Similarity metric: Jaccard(50%) + Overlap (50%)
- Edge cutoff: 0.375
- Layout: Prefuse Force Directed Layout
The enrichment map has: - 299 gene sets - 1116 edges
The figure shows the raw network before any layout or annotation is
roughly like that: Figure 1: Raw enrichment map as generated by
Cytoscape before layout or annotation.
To annotate the network, I used AutoAnnotate app in Cytoscape. This tool grouped similar gene sets together and gave each group a name using common words found in the pathways.
I used defualt parameters as below: - Label Column: GS_DESCR - Max word perlabel: 3 - Minimum word occurance: 1 - Adjacent word bonus: 8 - Amount of clushters: middle
The raw annotated network looks like: Figure 2: Very hard to read raw auto
annotation.
I changed the layout to “Layout Cluster to Minimize Overlap”, and
modified the word size. Then, I hide the node names to make the graph
clearer. get my figure ready to be used: Figure 3: Publication-Ready Figure
annotation.
Below is the theme network: Figure 4: Collapsed enrichment map showing
grouped biological themes.
In my theme network, there are a few small pairs like: - actin filament polymerization – disassembly protein depolymerization - pid il12 pathway – pid il23 events -*recruitment atm mediated – mll4 mll3 expression
Some pairs still make sense when asking if they fit with Alzheimer’s Disease Model. For example: - actin-related processes are often changed in AD brains - immune response like IL12/IL23 signaling may be linked to microglia activation
But overall, the theme network didn’t produce strong or clear clusters. This may show a limitation in using auto-clustering alone to define biological modules.
There do have some novel pathways or themes. For example: - mll4 mll3 expression and recruitment atm mediated suggest gene regulation and DNA repair roles that were not seen in the previous thresholded analysis. - Insulin receptor recycling shows up here, which could relate to the insulin signaling dysfunction hypothesis of Alzheimer’s, a topic not well highlighted in thresholded ORA.
The original study by Wang et al. (2022) PMID: 34970419 focused on how DNA glycosylase MUTYH affects DNA repair and contributes to Alzheimer’s Disease(AD). The paper suggested that oxidative DNA damage, microglial activation, and neuroinflammation are key parts of the disease mechanism.
My GSEA analysis showed both overlapping and new biological processes:
In Assignment 2, I used ORA with a 0.05 FDR cutoff, so the analysis only kept a small set of significant genes. That gave me around 36 enriched terms, mostly related to synapse, immune signaling, and vesicle transport.
With GSEA (non-thresholded), I used the entire ranked gene list, so I found more biological processes. This method is less biased, and includes weaker but coordinated signals that are missed by strict thresholds. For example: - ll4/Mll3 expression, ATM recruitment, and insulin recycling did not appear in A2 but were enriched in GSEA. - Pathways related to metabolism and epigenetic regulation appeared in GSEA but not in ORA.
So qualitatively, GSEA provides a broader picture than thresholded ORA.
Some of the interesting pathways in my results are supported by published papers:
Actin filament organization: Cytoskeleton and actin remodeling are essential for synaptic function. Disruption is common in AD (Sultana et al., 2010).
Insulin signaling: Insulin resistance in the brain is now considered part of AD pathology. Some researchers call AD “Type 3 diabetes” (de la Monte, 2009).
IL-12/IL-23 signaling: These cytokines can activate microglia and have been linked to chronic neuroinflammation in AD (Heneka et al., 2015).
These findings give extra support to my enrichment results and show that the pathways are biologically meaningful.
In this section, I chose to do a post-analysis focusing on transcription factors that may regulate genes from my GSEA result, especially the pathway “Negative Regulation of Intracellular Protein Transport.” Transcription factors are key upstream regulators that control the expression of gene networks. In the context of AD, several TFs have been implicated in regulating inflammation, endoplasmic reticulum (ER) stress responses, and lysosomal-autophagy processes. These biological functions align with many of the GSEA-enriched pathways, suggesting that TF dysregulation could be a contributing factor in the altered gene expression observed in AD brain tissues (Gjoneska et al., 2015).
Here are some key transcription factors implicated in AD pathology: - NF-κB (RELA subunit): This is a central inflammatory transcription factor known to be activated in AD. It responds to oxidative stress and Aβ accumulation and regulates pro-inflammatory cytokines (Liu et al., 2017). One of its downstream targets is NFKBIA, a feedback inhibitor of NF-κB, which was enriched in my GSEA results(rank #2). NF-κB activation in glial cells has been associated with neurodegeneration and impaired protein trafficking through chronic inflammation.
XBP1 (X-box Binding Protein 1): XBP1 is a transcription factor activated during ER stress as part of the unfolded protein response (UPR). It has been shown to regulate genes involved in protein folding and degradation, including OS9(rank #9) and ERLEC1(rank #1), both of which appeared in my enriched gene list. Overexpression of the spliced form XBP1s has demonstrated protective effects in AD models, including reduced Aβ deposition and improved cognitive function (Velloso et al., 2021).
ATF6 (Activating Transcription Factor 6): Like XBP1, ATF6 is an ER stress sensor that promotes transcription of genes that mitigate protein misfolding. It has been reported that activating ATF6 reduces APP expression and amyloid-beta levels in AD mouse models (Lee et al., 2020). ATF6 is believed to regulate genes involved in ER-associated degradation (ERAD), possibly including OS9.
The analysis shows that several key transcription factors involved in Alzheimer’s disease—NF-κB, XBP1, and ATF6 regulate the expression of genes enriched in my GSEA results pathway, “Negative Regulation of Intracellular Protein Transport”. These genes are involved in essential processes such as inflammation control, ER stress responses, and lysosomal degradation, all of which are disrupted in AD.
By linking these transcription factors to genes like NFKBIA, OS9, and ERLEC1, this post-analysis helps build a clearer picture of the regulatory mechanisms underlying the enrichment patterns observed. It supports the idea that dysregulation at the transcriptional level contributes significantly to the observed impairments in intracellular protein transport and protein clearance in AD.
Scientific Literature - Gjoneska, E., et al. (2015). Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature, 518(7539), 365–369. https://doi.org/10.1038/nature14252
Liu, T., et al. (2017). NF-κB signaling in inflammation. Signal Transduction and Targeted Therapy, 2, e17023. https://doi.org/10.1038/sigtrans.2017.23
Velloso, L. A., et al. (2021). The role of ER stress and the unfolded protein response in Alzheimer’s disease. Journal of Neurochemistry, 156(2), 210–223. https://doi.org/10.1111/jnc.15293
Lee, J. H., et al. (2020). ATF6 reduces amyloidogenesis via inhibition of BACE1 expression. Neuroscience Letters, 715, 134647. https://doi.org/10.1016/j.neulet.2019.134647
Nixon, R. A. (2013). The role of autophagy in neurodegenerative disease. Nature Medicine, 19(8), 983–997. https://doi.org/10.1038/nm.3232
Sultana, R., Perluigi, M., & Butterfield, D. A. (2010). Lipid peroxidation triggers neurodegeneration: A redox proteomics view into the Alzheimer disease brain. Free Radical Biology and Medicine, 50(4), 487–494. https://doi.org/10.1016/j.freeradbiomed.2009.11.017
Swerdlow, R. H. (2018). Mitochondria and Mitochondrial Cascades in Alzheimer’s Disease. Journal of Alzheimer’s Disease, 62(3), 1403–1416. https://doi.org/10.3233/JAD-170585
Software and Databases Used - GSEA Software (version 4.3.2): Subramanian, A., et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS, 102(43), 15545–15550. Broad Institute
Cytoscape (version 3.9.1): Shannon, P., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. https://cytoscape.org
AutoAnnotate App: Kucera, M., Isserlin, R., Arkhangorodsky, A., & Bader, G. D. (2016). AutoAnnotate: A Cytoscape app for summarizing networks with semantic annotations. F1000Research, 5, 1717.
Human Gene Ontology Gene Sets**: Bader Lab GeneSets